10 research outputs found

    Markov chain monte carlo algorithm for bayesian policy search

    Get PDF
    The fundamental intention in Reinforcement Learning (RL) is to seek for optimal parameters of a given parameterized policy. Policy search algorithms have paved the way for making the RL suitable for applying to complex dynamical systems, such as robotics domain, where the environment comprised of high-dimensional state and action spaces. Although many policy search techniques are based on the wide spread policy gradient methods, thanks to their appropriateness to such complex environments, their performance might be a ected by slow convergence or local optima complications. The reason for this is due to the urge for computation of the gradient components of the parameterized policy. In this study, we avail a Bayesian approach for policy search problem pertinent to the RL framework, The problem of interest is to control a discrete time Markov decision process (MDP) with continuous state and action spaces. We contribute to the eld by propounding a Particle Markov Chain Monte Carlo (P-MCMC) algorithm as a method of generating samples for the policy parameters from a posterior distribution, instead of performing gradient approximations. To do so, we adopt a prior density over policy parameters and aim for the posterior distribution where the `likelihood' is assumed to be the expected total reward. In terms of risk-sensitive scenarios, where a multiplicative expected total reward is employed to measure the performance of the policy, rather than its cumulative counterpart, our methodology is t for purpose owing to the fact that by utilizing a reward function in a multiplicative form, one can fully take sequential Monte Carlo (SMC), known as the particle lter within the iterations of the P-MCMC. it is worth mentioning that these methods have widely been used in statistical and engineering applications in recent years. Furthermore, in order to deal with the challenging problem of the policy search in large-dimensional state spaces an Adaptive MCMC algorithm will be proposed. This research is organized as follows: In Chapter 1, we commence with a general introduction and motivation to the current work and highlight the topics that are going to be covered. In Chapter 2ö a literature review pursuant to the context of the thesis will be conducted. In Chapter 3, a brief review of some popular policy gradient based RL methods is provided. We proceed with Bayesian inference notion and present Markov Chain Monte Carlo methods in Chapter 4. The original work of the thesis is formulated in this chapter where a novel SMC algorithm for policy search in RL setting is advocated. In order to exhibit the fruitfulness of the proposed algorithm in learning a parameterized policy, numerical simulations are incorporated in Chapter 5. To validate the applicability of the proposed method in real-time it will be implemented on a control problem of a physical setup of a two degree of freedom (2-DoF) robotic manipulator where its corresponding results appear in Chapter 6. Finally, concluding remarks and future work are expressed in chapter

    Off-Policy Deep Reinforcement Learning Algorithms for Handling Various Robotic Manipulator Tasks

    Full text link
    In order to avoid conventional controlling methods which created obstacles due to the complexity of systems and intense demand on data density, developing modern and more efficient control methods are required. In this way, reinforcement learning off-policy and model-free algorithms help to avoid working with complex models. In terms of speed and accuracy, they become prominent methods because the algorithms use their past experience to learn the optimal policies. In this study, three reinforcement learning algorithms; DDPG, TD3 and SAC have been used to train Fetch robotic manipulator for four different tasks in MuJoCo simulation environment. All of these algorithms are off-policy and able to achieve their desired target by optimizing both policy and value functions. In the current study, the efficiency and the speed of these three algorithms are analyzed in a controlled environment

    Tuning scaling factors of fuzzy logic controllers via reinforcement learning policy gradient algorithms

    Get PDF
    In this study a gain scheduling method for the scaling factors of the input variables to the fuzzy logic controller by means of policy gradient reinforcement learning algorithms has been proposed. The motivation for using PG algorithms is that they can scale RL problems into continuous high dimensional state-action spaces without the need for function approximation methods. Without incorporating any apriori knowledge of the plant, the proposed method optimizes the cost function of the learning algorithm and tries to find optimal solutions for the scaling factors of the fuzzy logic controller. To show the effectiveness of the proposed method it has been applied to a PD type fuzzy controller along with a nonlinear model of an inverted pendulum. By performing different simulations, it is observed that the proposed method can find optimal solutions within a small number of learning iterations

    Energy Optimization of Wind Turbines via a Neural Control Policy Based on Reinforcement Learning Markov Chain Monte Carlo Algorithm

    Full text link
    The primary focus of this paper is centered on the numerical analysis and optimal control of vertical axis wind turbines (VAWT) using Bayesian reinforcement learning (RL). We specifically tackle small-scale wind turbines with permanent magnet synchronous generator, which are well-suited to local and compact production of electrical energy in small scale such as urban and rural infrastructure installations. Through this work, we formulate and implement an RL strategy using Markov chain Monte Carlo (MCMC) algorithm to optimize the long-term energy output of the wind turbine. Our MCMC-based RL algorithm is a model-free and gradient-free algorithm, where the designer does not have to know the precise dynamics of the plant and their uncertainties. The method specifically overcomes the shortcomings typically associated with conventional solutions including but not limited to component aging, modeling errors and inaccuracies in the estimation of wind speed patterns. It has been observed to be especially successful in capturing power from wind transients; it modulates the generator load and hence rotor torque load so that the rotor tip speed reaches the optimum value for the anticipated wind speed. This ratio of rotor tip speed to wind speed is known to be critical in wind power applications. The wind to load energy efficiency of the proposed method is shown to be superior to the classical maximum power point tracking method

    Classical and intelligent methods in model extraction and stabilization of a dual-axis reaction wheel pendulum: A comparative study

    Get PDF
    Controlling underactuated open-loop unstable systems is challenging. In this study, first, both nonlinear and linear models of a dual-axis reaction wheel pendulum (DA-RWP) are extracted by employing Lagrangian equa-tions which are based on energy methods. Then to control the system and stabilize the pendulum's angle in the upright position, fuzzy logic based controllers for both x -y directions are developed. To show the efficiency of the designed intelligent controller, comparisons are made with its classical optimal control counterparts. In our simulations, as proof of the reliability and robustness of the fuzzy controller, two scenarios including noise -disturbance-free and noisy-disturbed situations are considered. The comparisons made between the classical and fuzzy-based controllers reveal the superiority of the proposed fuzzy logic controller, in terms of time response. The simulation results of our experiments in terms of both mathematical modeling and control can be deployed as a baseline for robotics and aerospace studies as developing walking humanoid robots and satellite attitude systems, respectively.The work of U.F.-G. was supported by the government of the Basque Country for the ELKARTEK21/10 KK-2021/00014 and ELKARTEK22/85 research programs, respectively

    Fuzzified Q-learning Algorithm In The Design Of Fuzzy Pid Controller

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2013Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2013Bu çalışmada, kapalı-çevrimli bir sistemin belli bir davranış ölçütünü maksimize veya minimize etmesi amacıyla, bulanık mantık kontrolörlerinin giriş ve çıkış üyelik fonksiyon parametrelerini, Q-öğrenme algoritmasına dayalı olarak ayarlayan bir yöntem önerilmektedir. Bulanık Mantık Kontrolörün üyelik fonksiyonlarının parametrelerinin ayarlaması için optimize edilecek vektör parametreleri kontrolör girişi olan hata ve hatanın değişimi ve de çıkış olarak seçilmiştir. Her üyelik fonksiyon parametresi için birbiri ile rekabette olan çeşitli adaylar ve her bir aday için bir Q-değeri tanımlanmıştır. Bu Q-değerleri adım adım Q-öğrenme algoritması tarafından güncelleştirilmektedir. Böylece öğrenim prosedürü, en iyi üyelik fonksiyon parametre takımını belirlemektedir. Böylece ilk başta, üyelik fonksiyon parametre değerleri belirsizken, bulanık kontrolörün çeşitli şartlar altında çalıştırılmak ve denenmek zorundadır. Araştırma aşaması çoğu zaman uzundur. Ancak ayar parametreleri fiziksel bir anlam taşıyorsa bu faz kısaltılabilir. Davranış ölçütü kullanılarak farklı parametre değerleri ile elde edilen her basamak yanıtı sonunda bulanık kontrolörün etkinliği bir değer ile ölçülmüş olur. <bu çalışmada davranış ölçütü olarak karesel hata integrali kullanılmıştır. Bu çalışmada, literatürde ilk kez olarak Q-öğrenme algoritmasının ödül fonksiyonunda, skalar değerler ataması yerine bulanık çok değerli atama kullanılmıştır. Böylece öğrenme algoritması daha duyarlı hale gelmiş ve bunun sonucu olarak yakınsama hızlandırılmıştır. Bulanık kontrolörün üyelik fonksiyon ayarlamasında oluşturulan bulanıklaştırılmış Q-öğrenme algoritması kullanıldığında sistem yanıtlarındaki hataların azaldığı ve davranış ölçütünün çok daha küçük değerlere ulaştığı görülmüştür.In this study we propose a sophisticated reward function for the QL algorithm which incorporates a fuzzy structure including more elaborate information about the rewards/punishments assigned to each action which is being taken in each step time. Firstly, we apply the proposed algorithm to two distinc second order linear systems, one with time delay and the other one without time delay, and obtain the corresponding unity step responses for the given systems. The obtained results demonstrate improvement in the performance of the systems in contrast with fuzzy controllers without tunning schemes. In the next step, in order to show the effectiveness of the proposed method we apply the algorithm to a nonlinear system. The system to be examined is considered to be an Inverted Pendulum and our goal is to balance it on a vertical position. The resulting simulations clarify that the balancing time considerably reduces in comparison with controlling the system with a non-tuned fuzzy controller.Yüksek LisansM.Sc

    Bayesian learning for policy search in trajectory control of a planar manipulator

    No full text
    Application of learning algorithms to robotics and control problems with highly nonlinear dynamics to obtain a plausible control policy in a continuous state space is expected to greatly facilitate the design process. Recently, policy search methods such as policy gradient in Reinforcement Learning (RL) have succeeded in coping with such complex systems. Nevertheless, they are slow in convergence speed and are prone to get stuck in local optima. To alleviate this, a Bayesian inference method based on Markov Chain Monte Carlo (MCMC), utilizing a multiplicative reward function, is proposed. This study aims to compare eNAC, a popular gradient based RL method, with the proposed Bayesian learning method, where the objective is trajectory control of a complex model of a 2-DOF planar manipulator. The results obtained for the convergence speed of the proposed algorithm and time response performance, illustrate that the proposed MCMC algorithm is qualified for complex problems in robotics

    A real-world application of Markov chain Monte Carlo method for Bayesian trajectory control of a robotic manipulator

    No full text
    Reinforcement learning methods are being applied to control problems in robotics domain. These algorithms are well suited for dealing with the continuous large scale state spaces in robotics field. Even though policy search methods related to stochastic gradient optimization algorithms have become a successful candidate for coping with challenging robotics and control problems in recent years, they may become unstable when abrupt variations occur in gradient computations. Moreover, they may end up with a locally optimal solution. To avoid these disadvantages, a Markov chain Monte Carlo (MCMC) algorithm for policy learning under the RL configuration is proposed. The policy space is explored in a non-contiguous manner such that higher reward regions have a higher probability of being visited. The proposed algorithm is applied in a risk-sensitive setting where the reward structure is multiplicative. Our method has the advantages of being model-free and gradient-free, as well as being suitable for real-world implementation. The merits of the proposed algorithm are shown with experimental evaluations on a 2-Degree of Freedom robot arm. The experiments demonstrate that it can perform a thorough policy space search while maintaining adequate control performance and can learn a complex trajectory control task within a small finite number of iteration steps

    A Markov chain Monte Carlo algorithm for Bayesian policy search

    No full text
    Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesian approach to policy search under RL paradigm, for the problem of controlling a discrete time Markov decision process with continuous state and action spaces and with a multiplicative reward structure. For this purpose, we assume a prior over policy parameters and aim for the ‘posterior’ distribution where the ‘likelihood’ is the expected reward. We propound a Markov chain Monte Carlo algorithm as a method of generating samples for policy parameters from this posterior. The proposed algorithm is compared with certain well-known policy gradient-based RL methods and exhibits more appropriate performance in terms of time response and convergence rate, when applied to a nonlinear model of a Cart-Pole benchmark

    Fuzzy PID controller design using q-learning algorithm with a manipulated reward function

    No full text
    In this paper we propose a manipulated reward function for the Q-learning algorithm which is a reinforcement learning technique and utilize the proposed algorithm to tune the parameters of the input-output membership functions of fuzzy logic controllers. The use of a reward signal to formalize the idea of a goal is one of the most distinctive features of reinforcement learning. To improve both the performance and convergence criteria of the mentioned algorithm we propose a fuzzy structure for the reward function. In order to demonstrate the effectiveness of the algorithm we apply it to two second order linear systems with and without time delay and finally a nonlinear system will be examined
    corecore